Goto

Collaborating Authors

 risk function




Unified Inference Framework for Single and Multi-Player Performative Prediction: Method and Asymptotic Optimality

Zhang, Zhixian, Hou, Xiaotian, Zhang, Linjun

arXiv.org Machine Learning

Performative prediction characterizes environments where predictive models alter the very data distributions they aim to forecast, triggering complex feedback loops. While prior research treats single-agent and multi-agent performativity as distinct phenomena, this paper introduces a unified statistical inference framework that bridges these contexts, treating the former as a special case of the latter. Our contribution is two-fold. First, we put forward the Repeated Risk Minimization (RRM) procedure for estimating the performative stability, and establish a rigorous inferential theory for admitting its asymptotic normality and confirming its asymptotic efficiency. Second, for the performative optimality, we introduce a novel two-step plug-in estimator that integrates the idea of Recalibrated Prediction Powered Inference (RePPI) with Importance Sampling, and further provide formal derivations for the Central Limit Theorems of both the underlying distributional parameters and the plug-in results. The theoretical analysis demonstrates that our estimator achieves the semiparametric efficiency bound and maintains robustness under mild distributional misspecification. This work provides a principled toolkit for reliable estimation and decision-making in dynamic, performative environments.


Data-Driven Information-Theoretic Causal Bounds under Unmeasured Confounding

Jung, Yonghan, Kang, Bogyeong

arXiv.org Machine Learning

We develop a data-driven information-theoretic framework for sharp partial identification of causal effects under unmeasured confounding. Existing approaches often rely on restrictive assumptions, such as bounded or discrete outcomes; require external inputs (for example, instrumental variables, proxies, or user-specified sensitivity parameters); necessitate full structural causal model specifications; or focus solely on population-level averages while neglecting covariate-conditional treatment effects. We overcome all four limitations simultaneously by establishing novel information-theoretic, data-driven divergence bounds. Our key theoretical contribution shows that the f-divergence between the observational distribution P(Y | A = a, X = x) and the interventional distribution P(Y | do(A = a), X = x) is upper bounded by a function of the propensity score alone. This result enables sharp partial identification of conditional causal effects directly from observational data, without requiring external sensitivity parameters, auxiliary variables, full structural specifications, or outcome boundedness assumptions. For practical implementation, we develop a semiparametric estimator satisfying Neyman orthogonality (Chernozhukov et al., 2018), which ensures square-root-n consistent inference even when nuisance functions are estimated using flexible machine learning methods. Simulation studies and real-world data applications, implemented in the GitHub repository (https://github.com/yonghanjung/Information-Theretic-Bounds), demonstrate that our framework provides tight and valid causal bounds across a wide range of data-generating processes.


Surfing: Iterative Optimization Over Incrementally Trained Deep Networks

Neural Information Processing Systems

We investigate a sequential optimization procedure to minimize the empirical risk functional $f_{\hat\theta}(x) = \frac{1}{2}\|G_{\hat\theta}(x) - y\|^2$ for certain families of deep networks $G_{\theta}(x)$. The approach is to optimize a sequence of objective functions that use network parameters obtained during different stages of the training process. When initialized with random parameters $\theta_0$, we show that the objective $f_{\theta_0}(x)$ is ``nice'' and easy to optimize with gradient descent. As learning is carried out, we obtain a sequence of generative networks $x \mapsto G_{\theta_t}(x)$ and associated risk functions $f_{\theta_t}(x)$, where $t$ indicates a stage of stochastic gradient descent during training. Since the parameters of the network do not change by very much in each step, the surface evolves slowly and can be incrementally optimized. The algorithm is formalized and analyzed for a family of expansive networks. We call the procedure {\it surfing} since it rides along the peak of the evolving (negative) empirical risk function, starting from a smooth surface at the beginning of learning and ending with a wavy nonconvex surface after learning is complete. Experiments show how surfing can be used to find the global optimum and for compressed sensing even when direct gradient descent on the final learned network fails.





On Quantification of Borrowing of Information in Hierarchical Bayesian Models

Ghosh, Prasenjit, Bhattacharya, Anirban, Pati, Debdeep

arXiv.org Machine Learning

In this work, we offer a thorough analytical investigation into the role of shared hyperparameters in a hierarchical Bayesian model, examining their impact on information borrowing and posterior inference. Our approach is rooted in a non-asymptotic framework, where observations are drawn from a mixed-effects model, and a Gaussian distribution is assumed for the true effect generator. We consider a nested hierarchical prior distribution model to capture these effects and use the posterior means for Bayesian estimation. To quantify the effect of information borrowing, we propose an integrated risk measure relative to the true data-generating distribution. Our analysis reveals that the Bayes estimator for the model with a deeper hierarchy performs better, provided that the unknown random effects are correlated through a compound symmetric structure. Our work also identifies necessary and sufficient conditions for this model to outperform the one nested within it. We further obtain sufficient conditions when the correlation is perturbed. Our study suggests that the model with a deeper hierarchy tends to outperform the nested model unless the true data-generating distribution favors sufficiently independent groups. These findings have significant implications for Bayesian modeling, and we believe they will be of interest to researchers across a wide range of fields.